Introduction

This document summarises the simulation results presented in the manuscript Non-parametric efficient estimation of marginal structural models with multi-valued time-varying treatments, including all tables in figures.

The approximate true values are as follows:

I also generated the numerator of the stabilized weights using a multinomial model (using only prior treatment history as covariates), we find similar results:

But the weights from the stabilized weights are much more stable than the non-stabilized ones:

##     true_p1          true_p2           true_p3           true_p4        
##  Min.   : 3.558   Min.   :  9.083   Min.   :  27.55   Min.   :   74.83  
##  1st Qu.: 4.280   1st Qu.: 17.339   1st Qu.:  70.59   1st Qu.:  296.69  
##  Median : 4.511   Median : 21.132   Median :  93.63   Median :  425.18  
##  Mean   : 4.999   Mean   : 25.011   Mean   : 125.03   Mean   :  625.67  
##  3rd Qu.: 5.327   3rd Qu.: 27.861   3rd Qu.: 143.18   3rd Qu.:  698.08  
##  Max.   :10.687   Max.   :196.530   Max.   :3613.96   Max.   :66456.64
##     true_p1          true_p2          true_p3          true_p4        
##  Min.   :0.7799   Min.   :0.3678   Min.   :0.1625   Min.   : 0.06445  
##  1st Qu.:0.9222   1st Qu.:0.7706   1st Qu.:0.6910   1st Qu.: 0.65517  
##  Median :0.9976   Median :0.9544   Median :0.9000   Median : 0.86337  
##  Mean   :1.0002   Mean   :1.0005   Mean   :1.0002   Mean   : 1.00011  
##  3rd Qu.:1.0488   3rd Qu.:1.1106   3rd Qu.:1.1259   3rd Qu.: 1.13853  
##  Max.   :1.3373   Max.   :3.1629   Max.   :7.0144   Max.   :15.21569

Of note I also recorded the expected bias that would be observed (by estimating treatment probabilities the same way we do in the simulations), it leads to the following values: 1.1510768, -0.0609292.

Simulation results

This section is subcategorized in 2 parts because we initial choices of learners lead to a very slow run time and it seemed like I would not get results in a reasonable amount of time. I then switched to a more basic set of learners to get results faster. The first section reflects this limited set of learners, the second section reflects the extended set of learners.

Quick runs

Done with 200 iterations of each sample size.

stackr = list("mean", "lightgbm", "multinom","xgboost", "nnet",
              "knn", "rpart", "naivebayes","glmnet",
              list("randomforest", ntree = 250, id = "randomforest"),
              list("ranger", num.trees = 250, id = "ranger")
)

stackm = c('SL.mean','SL.glmnet','SL.earth','SL.glm.interaction')

U1 results

  • Scenario 1: As sample size increases both SDR and TMLE converge towards the true value.
  • Scenario 2: (Weights misspecified) SDR seems to slowly converge towards the true value (maybe eventually will get there). TMLE on the other hand seems to be struggling to converge towards the true value which is pretty suprising.
  • Scenario 3: (Outcome misspecified) Starts very off for some reason but does converge towards the true value at a fast rate.
  • Scenario 4: (Weights misspecified early and outcome misspecified late) Similar start to scenario 3 and converge quickly for SDR but not for TMLE.
  • Scenario 5: (Weights misspecified late and outcome misspecified early) Performs a bit less well scenario 1.

Takeaways: Results are overall good for SDR, but less so for TMLE. Clearly now the weights are very well estimated as soon as we reach sufficient sample size. The outcome model probably needs to be improved a little to get consistent results in scenario 2.

U1 distributions

Bias of U1

Convergence of U1

\(\hat{\beta}\) results

  • Scenario 1: As sample size increases both SDR and TMLE converge towards the true value. We get consistency (in bias and mean square error) and a good coverage for both estimators. Variance of the estimators are a bit better than IPW.
  • Scenario 2: (Weights misspecified) Similar to the U1 results of scenario 2, SDR seems to slowly converge towards the true value (and seems to be stagnating at a small bias). TMLE on the other hand seems to be struggling to converge towards the true value which is pretty surprising. Not consistent SDR but is not aggressively bad.
  • Scenario 3: (Outcome misspecified) Consistent for both TMLE and SDR even catching up to the performance of IPW at the price of the variance of the estimates.
  • Scenario 4: (Weights misspecified early and outcome misspecified late) Consistent for SDR well behaved. TMLE fails in this scenario which is expected.
  • Scenario 5: (Weights misspecified late and outcome misspecified early) Doesn’t perform as well as expected but close to being consistent for SDR and TMLE.

Takeaways: Results are overall good for SDR, but less so for TMLE. Perhaps improvinf the outcome model would help in some scenarios.

Solution distribution

Mean Bias

Mean Bias * sqrt(n)

Median Bias

MSE

MSE * n

Variance of the estimates

Coverage

Figure 1 manuscript